-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add internal API and omdb command to expunge a sled #5234
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks really good overall, thanks for doing it! Just a few comments.
log: &slog::Logger, | ||
) -> anyhow::Result<Arc<DataStore>> { | ||
let db_url = self.resolve_pg_url(omdb, log).await?; | ||
eprintln!("note: using database URL {}", &db_url); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this use the logger?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it's supposed to, but I'm not positive. Probably a question for @davepacheco; I think the logger is only passed into all the omicron types that need one, and output from omdb
itself goes to stdout or stderr.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting. I think one way to handle this would be to have some kind of tag passed in as a kv to indicate that the message is user-facing, and in the log handler do filtering based on that. (You can also show such messages in a different manner, e.g. "warning: <text>" instead of [timestamp WARN] text
.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is one of very many things that get emitted directly to stdout/stderr. That's basically how omdb works today. There are definitely tradeoffs to doing it that way but I don't think it makes sense to change just this one and rethinking this approach seems a lot broader than this PR.
// This is an extremely dangerous and irreversible operation. We put a | ||
// couple of safeguards in place to ensure this cannot be called without | ||
// due consideration: | ||
// | ||
// 1. We'll require manual input on stdin to confirm the sled to be removed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems fine for now, but I kind of expect we're going to want to write automated tests around this sort of thing (maybe not using omdb, though).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, agreed. @benjaminleonard made some suggestions about how we might design the external API to mimic this kind of confirmation flow, but I did not implement any of that for the internal API. I put these guards on omdb, but (a) we can relax them (or provide flags to skip them) if needed, or (b) tests may be able to hit the internal API directly.
let opctx = OpContext::for_tests(log.clone(), datastore.clone()); | ||
let opctx = &opctx; | ||
|
||
// First, we need to look up the sled so we know its serial number. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this mean it won't work for PCs? Or we just won't necessarily be able to do the inventory collection check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I admit I don't know enough about how PC-based racks are set up to answer this question. I think PCs still have a serial number in the sled
table, right? If their sled-agent reports the same serial number, I think both this and the inventory check will work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should still work on PCs. This is how we are able to add and remove sleds on the a4x2 testbed.
let (_authz_sled, sled) = LookupPath::new(opctx, &datastore) | ||
.sled_id(args.sled_id) | ||
.fetch() | ||
.await | ||
.with_context(|| format!("failed to find sled {}", args.sled_id))?; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this the only reason we need the database connection? What about having an API instead to fetch sled info? (This is not a big deal either way.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We use it both for this and fetching the latest inventory, which doesn't have APIs either (but could). Using the database connection in this subcommand seems a little sketchy to me, but I wasn't sure about adding sled info + inventory internal API commands just to support guard rails. I'll keep this as-is for now but it's easy to revisit (or replace if/when we develop those APIs for some other consumers).
The omdb interface might be overly cautious; feedback welcome. You can try this against
omicron-dev run-all
; it looks roughly like:This handles the "internal" half of #5134